The Evocativity of Prediction, LLMs and DeepSeek
In December 2022, I got an unprecedented number of calls and messages from friends, entrepreneurs, students, and colleagues asking me if I knew about language models and urging me to meet them. They said: “I type, and the language model (i.e. ChatGPT) predicts and completes my sentences and requests !”. Mixed senses of void, excitement, and agitation were spreading fast. The term “language model” was popularized in a blast all over the planet. How did we get here?
In the early nineties, we witnessed the dawn of statistical language models. At that time, when I talked about our research on “stochastic language models” at AT&T Bell Laboratories (USA), people would scoff at that idea. We wanted to use them to empower machines to talk to humans. The topic was controversial then, and linguists criticized and ridiculed our research path. In 2000, at AT&T Labs, we had an aha moment when we connected a machine that could listen and talk to millions of customers calling with accents from all over the USA. It was an instant research and technological breakthrough, appreciated by the scientific and business communities, with little impact on the broad audience of consumers.
Fast forward in time: in 2011, there was another important breakthrough that impacted society. Millions of people could talk to SIRI through their iPhones, and later in 2014 Amazon launched Alexa to familiarize people with a different type of interaction at home or in the car. People worldwide started to explore what they could get out of talking to machines: information, all sorts of tasks, personal support, recommendations, etc..The anthropomorphization of the human-machine interaction works this way, and humans may forget that they are talking to machines, artificial entities. Back then, language models were buried in complex system architectures and hidden from the end-user.
Since the end of 2022, Large Language Models (LLMs) have been exposed to directly interact with users and amusing teachers in search of the solution to a math problem, students asking for help in writing essays, white collar employees writing reports, lawyers drafting contracts, young programmers writing code, etc.. The ability of language models to predict the next word(s) has been scaled to reach virtually everybody in the world.
LLMs have been known to require massive amounts of data and computing resources to train. That is why very few companies, mainly in the USA, have been able to build them, although the techne (Τέχνη) is public and known by the research community. Researchers and practitioners monitor the advancement and performance of LLMs through published benchmarks and leaderboards maintained by third parties. One of the most watched is the chatbot arena (*1). Here users challenge LLMs with questions or problems or requests through the user interface and give votes to the responses. This process is anonymous ( wrt to LLMs identity) and dynamic as users’ votes are submitted and registered.
In the last two years, in the top-10 of the leaderboard, we had models from USA big tech companies. Throughout this time, the prevailing narration was that LLMs could only be built by a few USA companies and people. Those few could access humongous amounts of data, specialized hardware, and computing platforms. In January 2025, this narration took a different path. DeepSeek, a Chinese startup mainly unknown to the world, emerged in the top-10 leaderboard for the first time. Most people did not see that happening, although there are known and secretive efforts to build LLMs worldwide. But it was not just about making room in the group of the high-performance LLMs. It was about two other relevant aspects of this piece of news. First, DeepSeek claims that its models were trained on cheaper hardware and efficient algorithms to reduce the amount of computation altogether. This shook the financial markets at first, put specialized hardware companies in the spotlight, and last but not least, challenged openly the USA primacy in funding, training, and sourcing state-of-the-art LLMs. The second relevant aspect is that the company released the DeepSeek-r1 model as an open-source model, while the other competing models are mostly closed models. When you release an open source model, researchers, practitioners, and companies can exploit it for their own purpose following the license terms - MIT license in this case - enabling a wide and diversified adoption. Even though DeepSeek’s LLM model has been heavily scrutinized and criticized for methodological and security aspects, the bottom line is that -as we know in scientific circles - the original narration was exaggerated, at best. Two years after the release of ChatGPT to the world, DeepSeek has proven that there is a vast space for new research and technological innovation outside the established technological hubs.
Researchers and entrepreneurs are thriving in North America and Asia through private and public funding for AI. In Europe, the focus and public funding of decision-makers is on regulating AI. This approach follows a narrative that stems from a sense of fear towards AI and above all downplays the resources for AI research and the important impact in areas such as medicine and health. The emergence of DeepSeek may be a stimulus for a new generation of European private investors and entrepreneurs that we have not seen in a very long time. The talent pool in Europe is vast and we even export it tariff-free.
(*1) https://lmarena.ai/